PROSPECT features and their application to missing data techniques for robust speech recognition
نویسنده
چکیده
Missing data theory has been applied to the problem of speech recognition in adverse environments. The resulting systems require acoustic models that are expressed in the spectral rather than in the cepstral domain, which leads to loss of accuracy. Cepstral Missing Data Techniques (CMDT) surmount this disadvantage, but require significantly more computation. In this paper, we study alternatives to the cepstral representation that lead to more efficient MDT systems. The proposed solution, PROSPECT features (Projected Spectra), can be interpreted as a novel speech representation, or as an approximation of the inverse covariance (precision) matrix of the Gaussian distributions modeling the log-spectra.
منابع مشابه
روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملAn Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملPROSPECT features and their application to missing data techniques for vocal tract length normalization
Speaker normalization by (piecewise) linear warping of the frequency axis is a popular method because of its simplicity and effectiveness. However, when this so-called vocal tract length normalization is applied to map test speakers with a shorter vocal tract onto acoustic models trained on speakers with a longer vocal tract, there is important information missing in the frequency bins at the h...
متن کاملRecognition Based on Missing Feature Techniques
Solutions for two important problems for the deployment of noise-robust large vocabulary automatic speech recognizers using the missing data paradigm are presented. irst problem is the generation of missing data masks. We propose and evaluate a method based on vector quantization and harmonicity that successfully exploits the characteristics of speech while requiring only weak assumptions on th...
متن کاملClassification of emotional speech using spectral pattern features
Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004